Merge Join
The Join node is used to create joins between 2 tables, adding the join to the database schema. The Join node can be connected to any Select (excluding Multi Select), Preparation, and Column Operation nodes. The Join function combines columns (horizontally) from 2 tables, based on matching columns and according to the selected join type.
The Join node can be used to join tables from the datasource, or to join datasource tables with new tables generated by data cleansing and preparation functions. For example, both the Date Range and Summarize functions generate new tables; these can then be joined to the original table from the datasource. At other times, you may have tables showing similar information that might be more user friendly for data analysts if combined into one table.
Create a Join
Connect the Join node to the 2 tables to be joined. With the Join node selected, go to the Properties panel to configure the join:
Resulting Table Name: name the new table.
Join Type: choose the join type (see below).
Join Columns: for each table, select the column by which to apply the join.
Join Types
Choose from the following join types:
- Full Outer Join: returns all rows from both tables.
- Inner Join: returns all rows from both tables based on a matching column. Only returns rows where the matching column's values are identical.
- Left Outer Join: returns all rows from the first/ left table, and matching rows from the second/ right table.
- Right Outer Join: returns all rows from the second/ right table, and matching rows from the first/ left table.
- Cross Join: returns a result set by multiplying the number of rows in the first table by the number of rows in the second table. The result is a table that returns all possible combinations of all rows from both tables.
Examples
In this example, we have 2 tables containing information about manufacturers; Manufacturer Details (red highlight below) and Manufacturers (green highlight).
We want to combine the columns in each of these 2 tables to produce 1 table:
To achieve this, we add the merge function to flow and connect it to both tables using a left outer join (blue arrow below), making sure to connect the tables in the order we want to merge them.
This produces the following results set, which includes 2 Manufacturer columns (one from each table):
Finally, under Column Selection we can deselect one of the 'Manufacturer' columns:
To show only one Manufacturer column, we need to open the Column Selection window from the Properties panel and deselect one of Manufacturer columns (red highlight below), leaving us with the required results set:
In this example, the Summarize node was added to the table to produce new columns based on summarize functions:
The new Summarize columns were then joined to the original table by connecting the Join node to both the Select table and the Summarize table, and adding a right outer join: